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^ . Abstract 

We construct codes over the ring F2 + u¥2 with u 2 = 0. These code are designed for 
use in DNA computing applications. The codes obtained satisfy the reverse comple- 
ment constraint, the GC content constraint and avoid the secondary structure, they 
are derived from the cyclic complement reversible codes over the ring F2 + u¥2- We 
also construct an infinite family of BCH DNA codes. 

Deoxyribonucleic acid (DNA) contains the genetic program for the biological develop- 
ment of life. DNA is formed by strands linked together and twisted in the shape of a double 

lO '. helix. Each strand is a sequence of four possible nucleotides, two purines; adenine (A), gua- 
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nine (G) and two pyrimidines; Thymine (T) and cytosine (C). The ends of a DNA strand 
are chemically polar with 5' and 3' ends, which implies that the strands are oriented. Hy- 
bridization, known as base pairing, occurs when a strand binds to another strand, forming 
a double strand of DNA. 

The strands are linked following the Watson-Crick model. Every (A) is linked with a 
(T), and every (C) with a (G), and vice versa. We denote the complement of X as X, 
i.e., A — T, T — A, G = C and C = G. The pairing is done in the opposite direction 
and the reverse order. For instance, the Watson-Crick complementary (WCC) strand of 
3' — ACTTAGA — 5' is the strand 5' — TCTAAGT — 3'. Non-specific hybridization occurs 
when hybridization between a DNA strand and its Watson-Crick complement does not take 
place, or when a DNA strand hybridizes with the reverse of a distinct strand. Another non- 
specific hybridization is when a strand folds back onto itself, forming a so-called "secondary 
structure" . 

DNA computing is the fusion of the world of genetic data analysis and the science of 
computation in order to tackle computationally difficult problems. This new area was born 
in 1994 when Adleman [3] solved an instance of a hard (NP-complete) computational prob- 
lem, namely the directed traveling Salesman problem on a graph with seven nodes. Their 
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approach was based on the WCC property of DNA strands. Since then, numerous studies 
have built on their research and expanded DNA computing to solve other mathematical 
problems [lj|6j[21|. Furthermore, since there are 4 n possibly single DNA strands of length 
n which can be quickly and cheaply synthesized, Mansuripur et al. [22J showed that DNA 
codewords can be used as ultra high density storage media. Other application make use of 
the DNA hybridization phenomena [2H] . 

A block code is called a DNA code if it satisfies some of the following constraints: 

1. the Hamming constraint for a distance d, 

2. the reverse-complement constraint, 

3. the reverse constraint, and 

4. the fixed GC-constraint. 

The purpose of the first three constraints avoid non-desirable hybridization between different 
strands. The fixed GG-constraint ensures all codewords have similar thermodynamic charac- 
teristics, which allows parallel operations on DNA sequences. Milenkovic and Kashyap in [24] 
proved that when designing a DNA code a fifth constraint should be added in order to make 
secondary structure less likely to happen. Secondary structure causes codewords to be- 
come computationally inactive, as the codewords have low chemical activity. This defeats 
the read-back mechanism in a DNA storage system by 30% as reported by Mansuripur et 
al. [22]. Milenkovic and Kashyap [24J used the Nussinov-Jacobson algorithm [25] to prove 
that the presence of a cyclic structure reduces the complexity of testing DNA codes for 
secondary structure, and also simplifies DNA sequence fabrication. Another advantage of 
the design of cyclic codes, as pointed out by Siap et al. [28J, is that the complexity of the 
dynamic programming algorithm to find the largest common subsequence between any two 
codewords in a cyclic code will be less than that of any other codes, there have been numer- 
ous papers on the design of DNA codes [TJ [2], [171 [28]. Gaborit and King [TTJ and Abualrub 
et al [2] constructed DNA codes over GF(4). Siap et al. [28J constructed cyclic DNA codes 
considering the GG-content constraint over F 2 ['u]/('U 2 — 1) = {0, 1, u, u+ 1}, where u 2 = 1, 
and used the deletion distance. 

In this paper, we construct cyclic linear codes suitable for DNA-computing. They are 
derived from cyclic reverse-complement codes over the ring R = ¥2 + 11W2, where u 2 = 0. 
We give infinite families of DNA codes with either fixed GC— content, or with few weights 
in order to obtain DNA codes with a large fixed GC— content after removing codewords 
that violate the GC— content constraint. Since our codes are cyclic, this can be done easily 
as noted by Abualrub et al. pQ. Furthermore, we will benefit from the fact that this ring 
contains F2 as a subring and has properties in common with Z4. In addition, techniques for 
implementation and decoding have been developed [7J. These codes can also correct certain 
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burst errors. We also construct BCH codes over this ring, and BCH DNA codes. BCH 
codes over fields are well known, hence we translate the properties of BCH codes over F 2 
to R. Previously Shankar [29J constructed BCH codes over the rings Z m . Calderbank and 
Sloane [9] gave BCH codes over Zpti as a Hensel lift of BCH codes from fields to rings. We 
construct BCH codes over the ring R without using a lift. Furthermore, decoding algorithms 
exist such as that given by Bonnecaze and Udaya [8]. For the reasons given above, these 
codes are very appropriate for DNA computing. 

1 Preliminaries 

The ring considered here is the ring R = F 2 + u¥ 2 , where u 2 = 0. A linear code over this 
ring is a module over R. Codes over this ring were introduced by Bachoc [5J and studied 
by Bonnecaze and Udaya [HE], Dougherty et al. [12j[T5], Gulliver and Harada [El[20], and 
more recently by Abualrub and Siap [2]. 

The ring R contains four elements {0, 1, u, 1 + u}. This is a local commutative ring with 
characteristic 2 and unique maximal ideal (u). It is also a finite chain ring. It contains 
unique chain ideals C (u) C R. The field F 2 can be seen as a subring of R. This is an 
interesting fact which will be useful later. 

For linear codes over a chain ring, the rank of C denoted rank(C) is defined as the 
minimum number of generator of C. In this paper, we only consider codes with odd length. 
We define the Hamming weight of a codeword x in C as Wh(x) = ni(x) + n u (x) + n u+ i(x), 
the Lee weight of x as Wi(x) = ni(x) + 2n u (x) + n u+ i(x), and the Euclidean weight as 
we{x) = n\{x) + 4n u (x) +n u+ i(x). The Hamming, Lee and Euclidean distances d#(x,y), 
g?l(x, y), d E (x,y) between two vectors x and y are wtui^ — y), wt L (x — y) and wt^(x — y), 
respectively. The minimum Hamming, Lee and Euclidean weights, dn, di and of C are 
the smallest Hamming, Lee and Euclidean weights among all nonzero codewords of C. 

The elements {0, u, u + 1, 1} of R are in one to one correspondence with the nucleotide 
DNA bases, A, T, C, G, such that 0— > A, u — > T, u + 1 — > C and 1 — > G. We remark that 
for all x e R, we have 

X + X = u. (1) 

We define the reverse of x = xqUi ■ ■ ■ x n -\ to be x r = x n -\x n -2 • • • xiXq. The complement of 

the codeword the vector tX 3uQiX~^ ' X' yi \ 5 

and the reverse complement 
(also called the Watson-Crick complement) is x rc = x n ~_ix n "_ 2 ■ ■ -Xix . 

A linear code C over R is said to be cyclic if it is invariant under a cyclic shift, i.e., 
(x n _i, Xq, . . . , x n _ 2 ) G C provided the codeword (xq, x±, . . . , x n _ 2 , x n -i) is in C. A code C is 
said to satisfy the reverse constraint if H(x r ,y) > d for all x,y G C, including x = y. 
A code C is said to satisfy the Hamming constraint if for any two different codewords 
x,y G C, H(x,y) > d. A code C is said to satisfy the reverse-complement constraint 
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if for any two codewords x,y G C (where x might equal y), H(x rc ,y) > d. A code C is said 
to satisfy the fixed GC— content constraint if any codeword x G C contains the same 
number of G and C elements. A code is called a DNA code if it satisfies some or all of the 
conditions above. 



2 Cyclic Codes over R 



In this section, we consider the cyclic codes of R since our goal is the study of cyclic DNA 
codes. The results of the reference above are reviewed and extended. We also introduce the 
concept of BCH codes over R. Only codes of odd length n are examined. 

The cyclic codes of odd length n over R are principal ideals of the ring R n = ■ 
Hence knowing the factorization of x n — 1 is important. 

Lemma 2.1 ( [18, Theorem 3.3]) Let R be a finite chain ring with residual field K of 
characteristic p. Let n be an integer such that (n,p) = 1, hence x n — 1 factors uniquely as 
basic irreducible polynomials. Furthermore, there is a one to one correspondence between 
the factors of x n — 1 over R and the factors of x n — 1 over K. 

From Lemma 12.14 we have a one to one correspondence between the factors of x n — 1 in R 
and the factors of x n — 1 in F 2 . However, since F 2 C R, the factors of x n — 1 in R are the 
same as in F 2 . This gives the following Lemma. 

Lemma 2.2 If n is odd then the factorization of x n — 1 into irreducible polynomials over R 
is the same as the factorization over F 2 . 

Theorem 2.3 Let C be a cyclic code over R. Hence R n is a principal ideal ring and there 
exist unique pairwise coprime polynomials F ,Fi,F 2 in ¥ 2 [x] such that F FiF 2 = x n — 1, 
and 

C = (F F 2 \uF ) = (F F 2 + uF Q ). (2) 

Moreover 
and 

rank(C) = degFi + degF 2 . (4) 
Proof. The proof follows from Lemma [2.21 and [Tl| Theorems 3.4 and 3.5]. □ 

From now on, for simplicity of notation, we will write the cyclic code given in ([5]) as 

C=(fo\ufi) = (fo + uf 1 ), (5) 
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such that /i|/q|x" — 1. It is clear f = F F 2 and F = f\. Hence from (@J, the rank of C is 
equal to 

r = rank(C) = n — deg f\. (6) 

There are two binary cyclic codes associated with a cyclic code C over R; the binary code 
Res{C) = {x G F 2 |3y G ¥%,x + uy G C} and Tor{C) = {x G F 2 |nx G C}, called respectively 
the residue code and the torsion code. It has been proven that [HI p 2150] Res{C) = (f ) 
and Tor{C) = (fi). 

Now we will consider the minimum distance of codes over R. First we prove the following 
Lemma. 

Lemma 2.4 If C is a code over R, then di(C) < 2dn{C), and dE{C) < 4rf//(C) 

Proof. Given a vector with Hamming weight d, the highest possible Lee weight is obtained 
if all the non-zero coordinates are u, in which case it has Lee weight 2dn- The same applies 
for the Euclidean weight except that this vector has Euclidean weight 4<i#. □ 



Theorem 2.5 Let C = (fo\ufi) be a cyclic code over R of odd length n. Then the minimum 
distance of C satsifies the following 

(i) d H (C) = d H (Tor(C)) = d H ({f 1 )), 

(ii) d L (C) < mm(d H ((fo),2d H ((h))), 
(in) d H (C) <deg/i + l, 

(w) L^J <degA + l, 

(v) L^J <degA + l. 

Proof. From |31[ Theorem 4.2], we have that dn(C) = du{Tor{C)). Part (ii) comes from 
the fact that the codes Tor{C) and Res{C) are binary cyclic codes generated by fi and /o, 
respectively, and satisfy u(f\) C C and (fo) C C. The dimension of Tor{C) is n — deg(/i). 
By the Singleton bound we have dH{Tor(C)) < deg(/i) + 1. Hence Part (iii) follows from 
Part (i). Parts (iv) and (v) follow from Part (iii) and Lemma [2.41 □ 



2.1 BCH Codes over R 

A BCH code of length n and designed distance 5 over a field W q , denoted by BCH(n, 5) q is 
defined as a cyclic code generated by lcm(M!, M 2 , . . . , Ms-i), where the Mj are the minimal 
polynomial factors of x n — 1 over W q . The definition of BCH codes over F 2 can be extended 
to the ring R = F 2 + wF 2 , u 2 = 0. This follows from Lemma [2.21 if x n — 1 = 111=0^ ^ s ^ ne 
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unique factorization of the polynomial x n — 1 over R. The Mj are minimal polynomial over 
F 2 , each of which corresponds to a cyclotomic class modulo n. 

Definition 2.6 Let n, Sq, 8\ be positive integers such that 1 < Si < Sq < n— 1. VFe define the 
BCH code of length n and designed distance (5q,5i) over R to be the cyclic code {gs ,ugs^}, 
with gs j = lcm(Mi), 1 < % < Sj — 1 where < j < 1 and #i < 5o- W^e denote this code by 
BCH(n,S ,Si). 

We have the following results concerning the rank and minimum distance of the BCH codes 
over R. 

Theorem 2.7 Let C be a BCH(n,So,Si) be a BCH code over R of length n and designed 
distance (So, Si). Then the following holds 

(i) min(5 ,25 1 ) < d L {C) < min(d H (BCH(n,S )),2d H (BCH(n,Si))) 

(ii) Si,0 < i < 1 can be assumed to be odd 

(Hi) If Si = 2w + 1, hence rank(C) > n — ord n (2)w 

(iv) Ifn = 2 m -l,Si = 2w + 1, and Si < 2^ + 3, hence rank(C) =2 m -l-mw 

(v) Ifn = 2 m -l,Si = 2 h - I, then d H {C) = S t 

(vi) Ifn = 2 m -1, then d H {C) < 2S X - 1 

(vii) If n = aSi, then dn(C) = Si 

(viii) Ifn = 2 m -l,Si = 2w + 1, then if 2 SW < J27=o ©, then d H = 2w + 1 

Proof. Part (i) follows from the BCH like-bound for the Lee distance of cyclic codes 
over R given by P, Theorem 7] and from Part (ii) of Theorem 12.51 The other assertions 
follows from Part (i), Theorem 12.51 and the results for BCH codes over fields in [501 Chap. 
9]. □ 

Example 2.8 For n = 63, Sq = 11, and Si = 9 we have a BCH(Q3, 11, 9) code over R, with 
2 75 codewords, minimum Lee distance 11, and minimum Hamming weight 9. 

3 DNA Codes 

This section presents the design of DNA codes. First we give the following definition. 

Definition 3.1 A code C is said to be reversible, respectively complement, if it satisfies 
x r G C for all x G C, respectively x c G C for all x G C. A code C is said to be reversible- 
complement if x rc G C for all x G C. A reversible- complement cyclic code is a cyclic code 
which is also reversible complement. 
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3.1 The Reverse-Constraint 

A sufficient condition for a code to satisfy the reverse constraint is to be invariant under the 
reverse permutation or given by <7r(c , c\ . . . , c n _i) = (c n _ 1; . . . Ci, c ). If c(x) = c + Cix + 
. . . c n _ix n_1 is a codeword of a cyclic code, we have <tr(c(x)) = x n ^ 1 c(x^ 1 ). Codes invariant 
under the action of <tr are called reversible. 

Definition 3.2 For f(x) E R[x], let f(x)* = x deg( -^ f(l/x) be the reciprocal polynomial of 
f(x). If equality holds between f(x) and f(x)*, we say that the polynomial is self-reciprocal. 

Lemma 3.3 ( [2, Lemma 4] Let f(x) and g(x) be two polynomials in R[x] with degf(x) > 
degf(x). Then the following holds. 

(i) [f{x)g{x)Y = f(x)*g(x)* 

(ii) [f(x) + g{x)\* = /(a;)* + x de zf- de z9g(x)* 

The following result due to Massey (231 Theorem 1] characterizes the reversible codes 
over fields. 

Lemma 3.4 A cyclic code over a finite field ¥ q generated by a monic polynomial g(x) is 
reversible if and only if g(x) is self-reciprocal. 

A cyclic code C = (fo\ufi) over R is said to be free if it satisfies C = (fo), i.e., f\ = fo. By 
a proof similar to that of Lemma [3.41 we have the following result. 

Lemma 3.5 Let C = (f{x)) be a free cyclic code over R generated by a monic polynomial 
f(x)\x n — 1. Then C is reversible if and only if f(x) is self-reciprocal. 

Conversely, if the code is not free the situation is different as we prove in the following 
theorem. 

Theorem 3.6 Let C = (fo\ufi) be a cyclic code of odd length n. Then C is reversible if and 
only if /o one? f\ are self-reciprocal. 

Proof. We have a natural ring-morphism ^ : R \- > ¥2 defined by ^(a) = a 2 mod 2. Then 
\l/ can be extended as follows $ : C i— > ¥ 2 [x]/(x n — 1) defined by 

$(c + c x x + . . . + c n ^x n ~ l ) = *(co) + *(c x )x + . . . ^(c^i)^ -1 

From [2j, we have the ideal /cer($) = (ufi) and = (fo). Note that the last ideal is 

in F 2 [x]. Since we have assumed that C is reversible then = (fo) is also reversible. 

Hence from Lemma [3.41 the polynomial fo is self-reciprocal. Since f\ is a binary polynomial 
that divides fo, there exists a polynomial g G F 2 [X] such that f = fig. We have that 
fo = (fid)* = f*9* — fo — fi9 since /j* and fi are in ¥ 2 [x] with the same leading coefficient 
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the same degree and the same constant term, and the polynomial fa\x n — 1 has simple roots, 
so then f = f* and g = g*. 

Assume now that /o and f\ are self-reciprocal, and let c(x) be a codeword of C. Then there 
exists a (x) and ai(x) in R[x] such that c(x) = a (x) f (x) + ai(x)ufi(x) . Using Lemma I3H1 
and the fact that fo(x) and fi(x) are self-reciprocal, c(x)* = aQ(x)fo(x) + a\{x)ux m fi{x) , 
which means that c(x)* is in C. Since the code C is cyclic, x n ~ r ~ 1 c*(x) = x n ~ 1 c(x~ 1 ) G C 
means that the reverse permutation leaves the code C invariant. Hence it is reversible. □ 

3.2 The Reverse- Complement Constraint 

From Definition 13.11 we have that a linear code which is reversible complement satisfies the 
reverse-complement constraint. 

Lemma 3.7 If C is a reversible- complement cyclic code, then C contains the codeword 

ul(x) = u + ux + • • • + mi 11 " 1 . 

Proof. Since C is linear, then (0, . . . , 0) G C. Also, C is reversible complement, so that 
(0, . . . ,0) rc = (u,...,u) G C. The last codeword corresponds to the polynomial ul(x) = 
u + ux + • • • + ux n ~ x . □ 

Theorem 3.8 Let C = (fo + uf\) = (fo\ufi), be a cyclic code over R of length odd n, with 
/i|/o|^ n — I in¥ 2 . If C is a reversible- complement code then we have ul(x) G C, fo(x) and 
fi(x) are self-reciprocal. 

Proof. From Lemma [3.71 we have ul(x) G C. Now, let fo(x) = ao + a\x + . . . a r x r . Since 
/o £ ^2^] and fo\x n — 1, then fo(x) = 1 + aix-l-. . . + a r _ix r_1 + x r . The vector representation 
of fo(x) is equal to v = (1, a 1; . . . , a r _i, 1, 0, . . . , 0). Hence v rc = (0, . . . , 0, 1, a r _! . . . a 1; 1) G 
C, and /q c (x) = u + ux + . . . + ux n ~ r ~ 1 + ux n ~ r + a r _ 1 x n ~ r+1 + . . . aix n_1 G C. Since C is 
linear, we have fQ C (x) + ul(x) G C. Using ([1]) and the fact that the characteristic of R is 2 
we obtain 

fl c (x) + = x n ~ r (l + a r _ x x + . . . + a x x r ~ x + x r ) G C. 

Now multiplying f rc (x) + by x r and using the fact that this operation is modulo 

x n — 1, we obtain fo(x)* = 1 + a r _ix + . . . + aix r ~ l + x r G C . Since C = (fo\ufi), there 
exists ^0(^)5 ki(x) G R[x) such that fo(x)* = ko(x)fo + uk\(x) fi(x) . Multiplying both sides 
of the previous equality by u gives 

ufo(x)* = uk (x)f (x), 
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but since fo{x)*,fo(x) G F 2 [x] have the same degree, the same leading coefficients and the 
same constant term, it must be that ko(x) = 1. This means that fo(x) is self-reciprocal. 
Now let ufi(x) = u(l + b\X + . . . + b s -\x s ~ x + x s ). Then 

ufi(x) rc = u + ux + ux 2 + . . . + ux n ~ s ~ 2 + «x"" s_1 + ub s ^ix n ~ s ~ 2 + ... ub x x n ~ 2 + ux n ~ x G C 

and hence uf\{x) rc + ul(x) G C. Using ([1]) and the fact that the characteristic of R is 2 we 
obtain that the last polynomial is equal to ux n ~ s ~ l + ub s ^ix n ~ s + . . . + ub\X n ~ 2 + ux n ~ x . 
Hence uf* G C, and for f we obtain that fi(x)* = fi(x). □ 
Now we prove that the condition given by Theorem 13.81 is also sufficient. 

Theorem 3.9 Suppose C = (/o|w/i) is a cyclic code of odd length n over R with fi\fo\(x n — 
1) G i*2[x]. If u + ux + . . . + mi" -1 G C and fo, f\ are self -reciprocal then C is a reversible- 
complement code. 

Proof. Let c(x) G C. We must prove that c(x) rc G C. Since C = (fo\ufi), there exist 
ao(x),ai(x) G R[x] such that 

c(x) = a Q (x)f (x) + at(x)ufi(x). 

Taking the reciprocal and by repeated use of Lemma [3.31 and the fact that fo(x) and f\(x) 
are self-reciprocal we have 

c(x)* = a (x)* f (x) + a 1 (x)*ux m fi(x). 

This gives that c*(x) is in C. Since C is cyclic, x n ~ t ~ 1 c(x) = cox n ~ t ~ 1 + c\x n ~ l + . . . + c t x n ~ 1 G 
C. It was also assumed that u + ux + . . . ax" -1 G C, which leads to 

u + ux + ... ux 71 ' 1 + c x n ~ t ~ 1 + c x x n ~ l + ... + c t x n ~ l G C. 

This is equal to u + ux + ... + .. . ux 11 ' 1 " 2 + (w + Co)x™~*~ 1 + ...(« + Q)x n_1 = u + mi + 
. . . ux n ~ l ~ 2 + cox n_ ' _1 + . . . + c t x n ' 1 , which is precisely (c*(x) rc )* = c(x) rc EC. □ 

Corollary 3.10 Let C be a cyclic code with odd length n. Then if u + ux + . . . + ux 11 ' 1 G C 
and if there exists an i such that 

2* = — 1 mod n, (7) 
then the code C is a reversible-complement code. 

Proof. Let C = (fo\uf\) be a cyclic code. The polynomials fa are divisors of x n — 1 
in F2. The decomposition into the product of minimal polynomials is given by x n — 1 = 
Y[Mi(x). Each Mj corresponds to a cyclotomic class Cl{i). Equation (j7j) gives that Cl(l) is 



9 



reversible and hence all the cyclotomic classes are reversible. Thus each minimal polynomial 
is self-reciprocal, and from Lemma 13.31 the polynomials are self-reciprocal. Then from 
Theorem 13.91 C is a reversible-complement code. □ 



Remark 3.11 It is obvious that the Hamming distance constraint is satisfied for a linear 
code. Furthermore, from Theorem \3.S\ a cyclic code (fo\ufi) is reversible- complement when 
fo and fi are self -reciprocal. Hence from Theorem \3.6i the code is reversible. 

3.3 BCH-DNA Codes 

Now the construction of BCH-DNA codes is considered. 

Theorem 3.12 Let C = BCH(n, 5 , 5i) be a BCH code over R of length 2 m + 1 with m > 1. 
then the code C is a DNA code over R. 

Proof. Since C is a cyclic code, the polynomial lcm(Mj) 1 < i < 5\ — lisa codeword of 
C. Hence the codeword Yli=i Mi = x n — l/(x — l) = l+ x + ... x n ~ x G C. Furthermore, we 
have 2 m = — 1 mod n. Then form Corollary 13.101 we obtain that C is a DNA code. □ 

Example 3.13 We have the existence of a BCH(65, 11, 9) code which is a DNA code with 
2 34 codewords and Lee minimum distance equal to the Hamming minimum distance of 13. 

More generally by a same proof as Theorem 13.121 we can have a BCH-DNA code of length 
n satisfying (J7J). 

Example 3.14 The code BCH(4:3, 7, 3) is a BCH-DNA code with 2 72 codewords and min- 
imum Lee distance 6. The binary image by the Gray map gives an optimal binary code 
[86, 72, 6] JEf . 

4 The GC- Weight 

As explained in the introduction, DNA codes with the same GC— content in all codeword 
ensure that the codewords have similar thermodynamic characteristics (e.g., melting tem- 
perature). 

Lemma 4.1 Let C = (fo\uf\) be a cyclic code over R. Then the the code uTor(C) = (ufi) 
is the subcode of C containing all codewords of C a multiple of u. 



10 



Proof. Let C u be the subcode of C containing all codewords with nonzero elements u. 
Then it is obvious that the code uTor(C) is a subset of C u . Let c be a codeword of C u , hence 
c = ko(x)fo(x) +uki(x)fi(x) = ug(x) with k ,ki,g G R{x]. The codewords ug(x) have 
coordinates or u so that we may write ug(x) = uf(x), with f(x) a binary polynomial. 
Since fi\fo, we obtain fi\f, and hence C u = uTor{C) = {uf\). □ 

Theorem 4.2 The GC— weight ofC = (fo\ufi) is given by the Hamming weight enumerator 
of the binary cyclic code 

Proof. 

The GC— content is obtained by multiplying the codewords of C by u, and from Lemma fl~Tl 
we have C u = (uf\). Hence the GC— content is given by the Hamming weight of the binary 
code generated by f\. □ 



5 Infinite Families of DNA Codes with Fixed GC— content 

5.1 DNA Codes from the Simplex Codes 

The binary simplex code S m is a code with parameters [2 m — l,m,2 m_1 ] and all nonzero 
codewords of weight 2 m_1 . This is the dual of the [2 m — 1, 2 m — 1— m, 3] Hamming code (which 
is also a BCH code of designed distance 3. Then S m is cyclic code with generator polynomial 
h*(x), which is the reciprocal of the parity check polynomial h(x) = x n — 1/Mi(x). If C/(l) 
is a reversible class, then h*(x) = h(x), and it is given by h*(x) = fpr^k ■ The simplex code 
is optimal in the sense of the constant GC— content property. It suffice to consider the free 
cyclic code over R generated by h*(x). This gives a cyclic codes over R with 4 m codewords 
and constant GC— weight 2 m_1 . Note that this DNA code contains more codewords than 
the code constructed from the binary simplex code given by the called Construction B2 |24j. 

Example 5.1 For m = 4, respectively m = 5, we have a cyclic code of length 15, respec- 
tively 31, containing 256 codewords with the same GC— Content equal to 8, respectively 1024 
codewords with the same GC— content equal to 16. Usually the GC— content is required to 
be in the range 30% — 50% of the length of the code. 

5.2 DNA Codes from the Zetterberg Codes 

A binary code C is said to be irreducible if it is the dual of a cyclic binary code generated 
by a minimal polynomial associated with a primitive nth root of unity a. Let m > and 
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n = 2 m + 1, then ord n (2) = 2m. Let j3 be a primitive element of F 2 2m, so that a = f3 2 " l ~ 1 
is a primitive nth root of unity with splitting field F 2 2m. Then the minimal polynomial 
associated with a is denoted by Mi = rLeC7(i)( x — an< ^ degMi = ord n {2) = 2m. The 
binary cyclic code C z generated by M\ is called the Zetterberg code. It is easily determined 
that the weights of C z are symmetric, since it is a binary code which contains the all-one 
codeword l(x). The parameters of C z are given by the following theorem. 

Theorem 5.2 ( ^ Theorem 16], \2T\ Theorem 5.4]) 
If m = 1( mod 2), then C z has parameters 

[2 m + l,2 m + l-2m,3]. 

A 3 = A 2 m_ 2 = and At = A 2 m_ 3 = 0. 
If m = mod 2, i/ien C z /ias parameters 

[2 m + l,2 m + l-2m,5 < d< 6]. 

T/ie asymptotic behavior of Aj zs given &?/ 

1 /2 m + 1 



The dual code C z , is called the irreducible Zetterberg code. It is a cyclic code generated 
by the polynomial h*(x), where h(x) = x ^ . Since C7(l) is a reversible class, M\ is self- 
reciprocal and hence h* = h. This gives that the dimension of C z is equal to 2m. We have 
the following result 

Lemma 5.3 ( [16]) All the weights of are even and the non zero-weight are 

a 2l = (2 m + IK 

with mi a constant dependant on i. The (even) minimum distance dj: is bounded by d[r > 



Proposition 5.4 The code Co = ( ^-1)1 h ) ^ as V arame 't ers 

[2 m + 1, 2m + 1, d = min(d ± , 2 m + 1 - dj)]. 
The weight enumerator of Co is 

where the a 2 i are the weights of the dual Zetterberg code given by Lemma \5. 31 
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Proof. The generator of Co has degree 2 m — 2m, hence the dimension is 2m + 1. The code 
Cj- is a subcode of C , and the all-one codeword I is in C . The weights of C are symmetric 
since it is a binary code that contains I. Let c 2 j be a codeword of Cj" of weight 2i. Then the 
codeword I — c 2i is in C and has weight 2 m + 1 — 2z. Hence there are at least a 2i codewords 
in Co of weight 2i and at least a 2 ; codewords with weight 2 m + 1 — 2i. The total number 
of codewords in C is 2 2m+1 , whereas the total number of codeword in Cj: is 2 m . Hence this 
gives that the weight enumerator of Co is a 2 j (x 2t + x 2m+1_2 *). The minimum distance of 
Co is given by the minimum of dj; and 2 m + 1 — dj;. 

2 m + l i 

Theorem 5.5 Let C = (fi) be the free cyclic code of R generated by fi = Hence 
C is a DNA code with 2 2 ( 2m+1 ) codewords, minimum distance d = min(<i^, 2 m + 1 — dj: ), and 
GC— weight given by 

where the a 2 j are the weights of the dual Zetterberg code given by Lemma 15.31 

Proof. Since f\ is self-reciprocal and the codeword ul is in C, from Theorem 13.81 the code 
is a DNA code. From Theorem 14.21 the GC— content is given by the weight distribution of 
Cq, which is given by Proposition 15.41 Hence the result. □ 



5.3 DNA Codes from the Reed-Muller Codes 

From Theorem 13.121 there exist BCH-DNA codes of length 2 m + 1. In the following sec- 
tion, we consider the construction of families of DNA codes of length 2 m — 1 with fixed 
GC— content. We begin by proving the following result. 

Proposition 5.6 Let n be an odd integer. Then if ord n (2) is even there exists a 2cyclotomic 
class modulo n which is reversible. 

1. If n = p is a prime, we assume that ord n (2) = 2w is even 2 2w = 1 mod n. Hence 
n\(2 w — 1)(2 W + 1). Since n is prime and cannot divide 2™ — 1 (because of the order), 
we have 2 W = —1 mod n which gives that C/(l) is reversible. 

2. If n = p a , we first have to prove the following implication 

ord p a{2) is even ord p (2) is even. 

Assume ord pa {2) even and ord p (2) odd. Then there exist i > odd such that 2 % = 1 
mod p 2 % = 1 + kp. Hence 2 ipa 1 = (l + kp) pa 1 = 1 mod p a , because (l + kp) pa 1 = 
(1 + kp a mod p a+l ) (the proof of the last equality can be found in [131 Lemma 3.30]). 
Hence 

2 ipa ^ = 1 mod p a . (8) 
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With i odd and p a ~ l odd, ord p <*{2) is odd (because or(i pa (2)|ip a_1 ), which is absurd. 
Hence ord p (2) is even, then there exists some integer j such that < j < ord p (2), and 
2 J = —1 mod p. Then from flH]), we have 2 JpQ 1 = —1 mod p a . This gives that CI{1) 
is reversible. 

3. If n = p\P2 with (pi,p 2 ) = 1, since ord n {2) = \cm(ord Pl (2),ord P2 (2)) is even, then 
either ord pl (2) or ord P2 (2) must be even. Assume that ord pi (2) is even. Then there 
exits 1 < k < ord Pl (2) such that q k = — 1 mod pi. Therefore — p 2 ) = — (n — P2) 
mod n, with k < ord pl (2). 

4. If n = p^p^ 2 with (^1,^2) — 1) we know that ord n {2) = lcm(ord Pl a 1 (2), ord P2 ^ 2 (2)). 
Then if ord p «i p «2 (2)is even we have either orc? pl (2) or ord P2 {2) is even. Therefore it 
suffices to repeat the process in case 3 above. 

Hence the generalization to any n such that ord n (2) is even. 

□ 

Now we consider the family of second order Reed-Muller codes [301 Ch. 13-15]. The 
punctured second order Reed-Muller code R*(2, m) is a cyclic code of length 2 m — 1, dimen- 
sion 1 + m + ( m ~ 1 ) m ; a nd generator polynomial = Yli<w 2 (s)<m-3 1 < s < 2 m — 2. 
i?*(2, m) contains the all one codeword and has minimum Hamming distance 2 m_2 — 1. The 
code R*{2,m) is a subset of the binary BCH code BCH 2 {2 m - l,2 m ~ 2 - 1) of designed 
distance 2 m ~ 2 — 1 and dimension 2 m — 1 — m(2 m ~ 3 — 1). The binary weight distribution 
of R(2,m) is given in [30, p. 443]. Since the codes R(2,m) are affine-invariant, we can 
apply [30j Theorem 14, Ch. 8] to determine the weight distribution of the punctured code 
R(2,m)*. Since this is a well known infinite class of codes with known weight distribution, 
it will be used to construct DNA codes with the reverse-complement constraint and also 
good GC— content. 

Let n = 2 m — 1 be a positive integer. If m is even then from Proposition 15.61 there exists 
at least one reversible class modulo n. Let g(x) G F^x] be a monic divisor of x n — 1 which 
generates the code RM(2,m)*. This can be decomposed as g(x) = gi(x)g 2 (x) such that 
gi(x) is the product of all non self- reciprocal minimal polynomials that divide g(x), and 
g2{x) is the product of all self- reciprocal minimal polynomials that divide g(x). Hence g 2 (x) 
is a self-reciprocal polynomial, and the all one codeword is contained in the code generated 
by g 2 {x)- From Theorem 13.91 we then have a DNA code C = (g 2 (x)). This code contains 
at least codewords with GC— content equal to i where the Ai are the coefficients of the 
weight enumerator of RM*(2,m). 

Example 5.7 Ifm = 4, then n = 15 and there are 5 cyclotomic classes. The only reversible 
class is Cl(5), and the generator of RM(2,A)* is the minimal polynomial associated with 
C7(l). Thus we cannot apply the procedure above. 
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Ifm = 6, then n = 63, Cl(7) and C/(21) are reversible classes. Furthermore M 7 M 2 i\g(x) 
the generator polynomial of RM{2, 6)*. Hence (M 7 M 21 (x)) is a DNA code over R since it 
is generated by a self -reciprocal polynomial and contains the codeword ul. RM{2, 6)* is a 
subcode of (M 7 M 21 (x)) . For a given weight i, this code contains at least Ai codewords of 
weight i where the A* are the coefficients of the weight polynomial of RM{2, 6)*. These 
weights are given in the following table. 



% 


Ai 


47or 15 


2604 


23 or 39 


291648 


27 or 35 


888832 


31 


3011220 



Table 1: The minimum number of codewords of weight % in the DNA code (M 7 (x)M 2 i(x)) 
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